-
Notifications
You must be signed in to change notification settings - Fork 83
feat(deployment): Add --setup-only flag to start-clp.sh to set up the package without starting components (resolves #1475).
#1502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… without starting services (resolves y-scope#1475).
|
Warning Rate limit exceeded@junhaoliao has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 6 minutes and 5 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughRenames two abstract methods in Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Script as start_clp.py
participant Controller as DockerComposeController
participant Docker as Docker Compose
rect rgb(230,245,230)
Note over User,Script: Normal flow
User->>Script: ./start-clp.sh
Script->>Controller: __init__()
Script->>Controller: set_up_env()
Controller->>Controller: generate/write env
Script->>Controller: start()
Controller->>Docker: docker-compose up
end
rect rgb(245,240,230)
Note over User,Script: Setup-only flow
User->>Script: ./start-clp.sh --setup-only
Script->>Controller: __init__()
Script->>Controller: set_up_env()
Controller->>Controller: generate/write env
Script->>User: exit (no start)
end
rect rgb(240,245,250)
Note over User,Script: Stop flow
User->>Script: ./start-clp.sh stop
Script->>Controller: __init__()
Script->>Controller: stop()
Controller->>Docker: docker-compose down
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def start(self) -> None: | ||
| """ | ||
| Starts CLP's components using Docker Compose. | ||
| :raise: Propagates `check_docker_dependencies`'s exceptions. | ||
| :raise: Propagates `subprocess.run`'s exceptions. | ||
| """ | ||
| check_docker_dependencies( | ||
| should_compose_project_be_running=False, project_name=self._project_name | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing environment setup before starting containers
The refactor made set_up_env() public but DockerComposeController.start() no longer calls it. The normal CLI path in start_clp.py only invokes controller.start() unless --setup-only is passed, so a default sbin/start-clp.sh run never writes the .env file or prepares per-component directories. Docker Compose will therefore launch with unset credentials and paths, causing startup to fail on a clean install. start() should still invoke set_up_env() when not in setup-only mode.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@codex good point - i forgot to move controller.set_up_env() outside the if parsed_args.setup_only: block. Could you help me move the call so it appears right before the if parsed_args.setup_only: check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems i have too much stuff running on codex at the moment. i'll do the change on my own then
|
As discussed offline with @kirkrodrigues, multi-node deployment support (which was removed in #1178) will be reintroduced based on this PR. When re-adding it, we'll only need to provide documentation with setup instructions, as the package is already fully compatible with multi-node deployment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
components/clp-package-utils/clp_package_utils/controller.py (1)
645-723: Centralise directory creation in set_up_env for consistency and idempotence.Move creation of top‑level data/logs/tmp/output dirs into
set_up_env()so both the default start path and--setup-onlyshare a single source of truth.Apply:
@@ def set_up_env(self) -> None: - # Generate container-specific config. + # Ensure top-level directories exist. + self._clp_config.data_directory.mkdir(parents=True, exist_ok=True) + self._clp_config.logs_directory.mkdir(parents=True, exist_ok=True) + self._clp_config.tmp_directory.mkdir(parents=True, exist_ok=True) + self._clp_config.archive_output.get_directory().mkdir(parents=True, exist_ok=True) + self._clp_config.stream_output.get_directory().mkdir(parents=True, exist_ok=True) + + # Generate container-specific config. container_clp_config = generate_docker_compose_container_config(self._clp_config)components/clp-package-utils/clp_package_utils/scripts/start_clp.py (1)
84-101: Callset_up_env()before starting services to ensure .env file is generated.The startup path breaks on clean installs. In normal operation (without
--setup-only),controller.start()is called at line 93 without first callingset_up_env(), so the.envfile is never written. Docker Compose will fail because it requires this file.Recommended: Move
set_up_env()intoDockerComposeController.start()so all callers are safe.Additional fixes:
- Lines 67 and 77: Replace bare
except:withexcept Exception:to avoid catching system exits and interrupts.- Line 95: Use
isinstance(ex, ValueError)instead oftype(ex) == ValueError.@@ - if parsed_args.setup_only: + if parsed_args.setup_only: controller.set_up_env() logger.info( "Completed setup. Services are not started because --setup-only was provided." ) return 0 controller.start() @@ - except: + except Exception: logger.exception("Failed to load config.") return -1 @@ - except: + except Exception: logger.exception("Failed to create necessary directories.") return -1 @@ - except Exception as ex: - if type(ex) == ValueError: - logger.error(f"Failed to start CLP: {ex}") - else: - logger.exception("Failed to start CLP.") + except ValueError as ex: + logger.error("Failed to start CLP: %s", ex) + except Exception: + logger.exception("Failed to start CLP.") return -1
♻️ Duplicate comments (1)
components/clp-package-utils/clp_package_utils/controller.py (1)
731-757: start() must prepare the environment before running Docker Compose.Without calling
set_up_env(), the.envfile and per‑component dirs aren’t prepared; clean installs will fail. Makestart()invokeself.set_up_env()before composing. Also switch thelogger.infoto parameterised logging.Apply:
@@ - def start(self) -> None: + def start(self) -> None: """ Starts CLP's components using Docker Compose. @@ - check_docker_dependencies( + # Ensure environment (.env, dirs, settings.json) is prepared. + self.set_up_env() + + check_docker_dependencies( should_compose_project_be_running=False, project_name=self._project_name ) - deployment_type = self._clp_config.get_deployment_type() - logger.info(f"Starting CLP using Docker Compose ({deployment_type} deployment)...") + deployment_type = self._clp_config.get_deployment_type() + logger.info("Starting CLP using Docker Compose (%s deployment)...", deployment_type)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
components/clp-package-utils/clp_package_utils/controller.py(3 hunks)components/clp-package-utils/clp_package_utils/scripts/start_clp.py(2 hunks)docs/src/user-docs/quick-start/clp-json.md(1 hunks)docs/src/user-docs/quick-start/clp-text.md(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (1)
components/clp-package-utils/clp_package_utils/controller.py (2)
set_up_env(96-100)set_up_env(645-729)
components/clp-package-utils/clp_package_utils/controller.py (2)
components/clp-package-utils/clp_package_utils/general.py (3)
check_docker_dependencies(168-191)DockerComposeProjectNotRunningError(78-82)DockerDependencyError(53-54)components/clp-py-utils/clp_py_utils/clp_config.py (2)
get_deployment_type(770-774)DeploymentType(89-91)
🪛 Ruff (0.14.1)
components/clp-package-utils/clp_package_utils/controller.py
743-743: Logging statement uses f-string
(G004)
751-751: subprocess call: check for execution of untrusted input
(S603)
783-783: subprocess call: check for execution of untrusted input
(S603)
784-784: Starting a process with a partial executable path
(S607)
🔇 Additional comments (4)
docs/src/user-docs/quick-start/clp-text.md (1)
22-26: Docs tip reads well and matches the CLI flag.LGTM.
docs/src/user-docs/quick-start/clp-json.md (1)
20-24: Docs tip is clear and consistent with the new flag.LGTM.
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (1)
40-44: Flag wiring and help text look correct.No concerns.
components/clp-package-utils/clp_package_utils/controller.py (1)
751-756: Note on subprocess lint warnings (S603/S607).Inputs are controlled and executed via argument lists; risk is acceptable here. No action required.
Also applies to: 783-787
| ) | ||
|
|
||
| deployment_type = self._clp_config.get_deployment_type() | ||
| logger.info(f"Starting CLP using Docker Compose ({deployment_type} deployment)...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Parameterise logs instead of f‑strings.
Satisfies Ruff G004 and avoids interpolation when log level is higher.
Apply:
- logger.info(f"Starting CLP using Docker Compose ({deployment_type} deployment)...")
+ logger.info("Starting CLP using Docker Compose (%s deployment)...", deployment_type)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.info(f"Starting CLP using Docker Compose ({deployment_type} deployment)...") | |
| logger.info("Starting CLP using Docker Compose (%s deployment)...", deployment_type) |
🧰 Tools
🪛 Ruff (0.14.1)
743-743: Logging statement uses f-string
(G004)
🤖 Prompt for AI Agents
components/clp-package-utils/clp_package_utils/controller.py around line 743:
the logger call uses an f-string which violates Ruff G004 and performs
interpolation even when the log level is higher; change it to a parameterised
log call by passing the format string and deployment_type as an argument (e.g.
logger.info("Starting CLP using Docker Compose (%s deployment)...",
deployment_type)) so interpolation is deferred to the logging framework.
| def _get_num_workers() -> int: | ||
| """ | ||
| :return: Number of worker processes to run. | ||
| """ | ||
| # This will change when we move from single to multi-container workers. See y-scope/clp#1424 | ||
| return multiprocessing.cpu_count() // 2 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Ensure at least one worker on single‑CPU hosts.
multiprocessing.cpu_count() // 2 can be 0. Floor to 1.
Apply:
- return multiprocessing.cpu_count() // 2
+ return max(1, multiprocessing.cpu_count() // 2)🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/controller.py around lines 791
to 797, the function currently returns multiprocessing.cpu_count() // 2 which
can evaluate to 0 on single-CPU hosts; change the return to ensure at least one
worker (e.g. return max(1, multiprocessing.cpu_count() // 2)) so the function
never returns 0 and update the docstring if needed to reflect the minimum of 1
worker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you're right. let's create an issue and fix in another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junhaoliao, would you like me to create a GitHub issue to track this fix? I can include the relevant context about ensuring at least one worker on single-CPU hosts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue to track the worker count fix
gh issue create \
--title "Fix worker count calculation to return at least one worker on single-CPU hosts" \
--body "## Problem
The \`_get_num_workers()\` method in \`components/clp-package-utils/clp_package_utils/controller.py\` currently returns \`multiprocessing.cpu_count() // 2\`, which evaluates to 0 on single-CPU hosts. This would prevent workers from starting properly.
## Location
\`components/clp-package-utils/clp_package_utils/controller.py\`, lines 791-797 (as of PR #1502)
## Impact
On systems with a single CPU core, no workers would be started, causing CLP components to fail.
## Solution
Ensure the method returns at least 1 worker by flooring the result to a minimum of 1.
## References
- Original discussion: https://github.com/y-scope/clp/pull/1502#discussion_r2462785178
- PR #1502: https://github.com/y-scope/clp/pull/1502
- Requested by: @junhaoliao" \
--assignee junhaoliao
Length of output: 42
@junhaoliao, I've created issue #1509 to track this fix. The issue includes:
- Description of the problem with
_get_num_workers()returning 0 on single-CPU hosts - Location in the codebase
- Impact on CLP components
- Suggested solution
- Backlinks to this PR and comment discussion
- Assigned to you
🧬 Code Graph Analysis Results
[Relevant code snippets that could help understand the file and user comment]
- File: components/clp-package-utils/clp_package_utils/general.py
- Snippet: Docker-related dependency checks and errors
- Lines 168-191
def check_docker_dependencies(should_compose_project_be_running: bool, project_name: str): """ Checks if Docker and Docker Compose are installed, and whether a Docker Compose project is running. :param should_compose_project_be_running: :param project_name: The Docker Compose project name to check. :raise DockerNotAvailableError: If any Docker dependency is not installed. :raise DockerComposeProjectNotRunningError: If the project isn't running when it should be. :raise DockerComposeProjectAlreadyRunningError: If the project is running when it shouldn't be. """ try: subprocess.check_output( ["docker", "--version"], stderr=subprocess.STDOUT, ) except subprocess.CalledProcessError as e: raise DockerNotAvailableError("docker is not installed or available on the path", e) from e is_running = _is_docker_compose_project_running(project_name) if should_compose_project_be_running and not is_running: raise DockerComposeProjectNotRunningError(project_name) if not should_compose_project_be_running and is_running: raise DockerComposeProjectAlreadyRunningError(project_name) - Snippet: Docker dependency error classes
- Lines 78-82
class DockerComposeProjectNotRunningError(DockerDependencyError): """Raised when a Docker Compose project is not running but should be.""" def __init__(self, project_name: str) -> None: super().__init__(errno.ESRCH, f"Docker Compose project '{project_name}' is not running.")- Lines 53-54
class DockerDependencyError(OSError): """Base class for errors related to Docker dependencies.""" - Snippet: Dumping shared container config
- Lines 361-369
def dump_shared_container_config(container_clp_config: CLPConfig, clp_config: CLPConfig): """ Dumps the given container config to `CLP_SHARED_CONFIG_FILENAME` in the logs directory, so that it's accessible in the container. :param container_clp_config: :param clp_config: """ dump_container_config(container_clp_config, clp_config, CLP_SHARED_CONFIG_FILENAME) - Snippet: Generate Docker Compose container config
- Lines 312-322
def generate_docker_compose_container_config(clp_config: CLPConfig) -> CLPConfig: """ Copies the given config and transforms mount paths and hosts for Docker Compose. :param clp_config: :return: The container config. """ container_clp_config = clp_config.model_copy(deep=True) container_clp_config.transform_for_container() return container_clp_config - Snippet: Determine CLP_HOME
- Lines 141-157
def get_clp_home(): # Determine CLP_HOME from an environment variable or this script's path clp_home = None if "CLP_HOME" in os.environ: clp_home = pathlib.Path(os.environ["CLP_HOME"]) else: for path in pathlib.Path(__file__).resolve().parents: if "lib" == path.name: clp_home = path.parent break if clp_home is None: raise ValueError("CLP_HOME is not set and could not be determined automatically.") elif not clp_home.exists(): raise ValueError("CLP_HOME set to nonexistent path.") return clp_home.resolve() - Snippet: Retention check utility
- Lines 668-675
def is_retention_period_configured(clp_config: CLPConfig) -> bool: if clp_config.archive_output.retention_period is not None: return True if clp_config.results_cache.retention_period is not None: return True return False - Snippet: Validation for DB config
- Lines 498-509
def validate_db_config( clp_config: CLPConfig, component_config: pathlib.Path, data_dir: pathlib.Path, logs_dir: pathlib.Path, ): if not component_config.exists(): raise ValueError(f"{DB_COMPONENT_NAME} configuration file missing: '{component_config}'.") _validate_data_directory(data_dir, DB_COMPONENT_NAME) _validate_log_directory(logs_dir, DB_COMPONENT_NAME) validate_port(f"{DB_COMPONENT_NAME}.port", clp_config.database.host, clp_config.database.port) - Snippet: Validation for MCP server config
- Lines 589-594
def validate_mcp_server_config(clp_config: CLPConfig, logs_dir: pathlib.Path): _validate_log_directory(logs_dir, MCP_SERVER_COMPONENT_NAME) validate_port( f"{MCP_SERVER_COMPONENT_NAME}.port", clp_config.mcp_server.host, clp_config.mcp_server.port ) - Snippet: Validation for Redis config
- Lines 518-531
def validate_redis_config( clp_config: CLPConfig, component_config: pathlib.Path, data_dir: pathlib.Path, logs_dir: pathlib.Path, ): if not component_config.exists(): raise ValueError( f"{REDIS_COMPONENT_NAME} configuration file missing: '{component_config}'." ) _validate_data_directory(data_dir, REDIS_COMPONENT_NAME) _validate_log_directory(logs_dir, REDIS_COMPONENT_NAME) validate_port(f"{REDIS_COMPONENT_NAME}.port", clp_config.redis.host, clp_config.redis.port) - Snippet: Validation for Queue config
- Lines 512-515
def validate_queue_config(clp_config: CLPConfig, logs_dir: pathlib.Path): _validate_log_directory(logs_dir, QUEUE_COMPONENT_NAME) validate_port(f"{QUEUE_COMPONENT_NAME}.port", clp_config.queue.host, clp_config.queue.port) - Snippet: Validation for Results Cache (Mongo) config
- Lines 545-562
def validate_results_cache_config( clp_config: CLPConfig, component_config: pathlib.Path, data_dir: pathlib.Path, logs_dir: pathlib.Path, ): if not component_config.exists(): raise ValueError( f"{RESULTS_CACHE_COMPONENT_NAME} configuration file missing: '{component_config}'." ) _validate_data_directory(data_dir, RESULTS_CACHE_COMPONENT_NAME) _validate_log_directory(logs_dir, RESULTS_CACHE_COMPONENT_NAME) validate_port( f"{RESULTS_CACHE_COMPONENT_NAME}.port", clp_config.results_cache.host, clp_config.results_cache.port, ) - Snippet: Validation for WebUI config
- Lines 577-586
def validate_webui_config( clp_config: CLPConfig, client_settings_json_path: pathlib.Path, server_settings_json_path: pathlib.Path, ): for path in [client_settings_json_path, server_settings_json_path]: if not path.exists(): raise ValueError(f"{WEBUI_COMPONENT_NAME} {path} is not a valid path to settings.json") validate_port(f"{WEBUI_COMPONENT_NAME}.port", clp_config.webui.host, clp_config.webui.port) - Snippet: Deployment type getters (from CLP config utils)
- Lines 770-774
def get_deployment_type(self) -> DeploymentType: if QueryEngine.PRESTO == self.package.query_engine: return DeploymentType.BASE else: return DeploymentType.FULL
- Snippet: Docker-related dependency checks and errors
…ure environment setup is always performed.
kirkrodrigues
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments.
For the PR title, how about:
feat(deployment): Add `--setup-only` flag to `start-clp.sh` to set up the package without starting components (resolves #1475).
|
|
||
| @abstractmethod | ||
| def stop(self) -> None: | ||
| def set_up_env(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to put this above start? That's how it's ordered in DockerComposeController.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, it should show up above start() as well
components/clp-package-utils/clp_package_utils/scripts/start_clp.py
Outdated
Show resolved
Hide resolved
Co-authored-by: kirkrodrigues <[email protected]>
--setup-only flag to start-clp.sh for setup without starting services (resolves #1475).--setup-only flag to start-clp.sh to set up the package without starting components (resolves #1475).
Description
Checklist
breaking change.
Validation performed
Then performed a search of string "1" in the webui http://localhost:4000 and observed results displayed in the UI.
Summary by CodeRabbit
New Features
Documentation